Analysis:

First we import and merge the data from three different sources:

Then we compute several features that could describe synesthete consistency in three main parts. (A) we replicate features found in the literature (i.e. Van Petersen et al. (2020b) ) . (B) We extract features based on the form. (C) We harness a geography package to compute segment based features (D) We compute polygon based features. (E) Convex Hull (F) Angles.

Finally we have a summary table presenting:

In addition some (not very successful IMO) machine learning to try to find out which feature combination could be best to diagnose space sequence synesthesia.

Introduction

*Synesthesia* is the concomitant perception from two different senses, for example certain humans perceive numbers as having a well defined position in space. Synesthetes come in all colour and flavours such as …

Space Sequence Synesthesia (SSS) is a phenomenon present in some humans who perceive a spatial property for some stimuli. One of the first report of this phenomenon describes a particular spatial placement for numerals (Galton 1880).

A srtict definition of Synesthes requires this five different criteria: )

  1. Automaticity: the inducer automatically triggers the concurrent. For example February might automatically trigger a specific location in the top left peri-space. Analogously to a colour word activates the colour in human literates (see stroop effect).

  2. Unidirectionality: the inducer triggers the concurrent but the concurrent trigger the inducer. For example if February automatically triggers the top left pei-space, the top-left peri space does not trigger February.

  3. Consciousness: The experience is conscious. For example a synesthete is conscious of his or her perception of February in the top left peri-space.

  4. Development: Should be present early in development. For example seeing month in particular spatial location already occurred as a child.

  5. Consistency: inducer-concurrent pair is stable in time. For example February is perceived on the top left, whether the time of the day or age. (Altough some changes might occur with aging).

Consistency is the most suited for experimental settings since it can be tested by repeatedly presenting specific inducers to participants and collect the responses for their concurrent. If comparatively similar responses are given for the same inducers, then syneshtesia could be detected. Those tests have become the golden standart to detect synesthesia successfully, for example colour-grapheme synesthesia using colour picker (Rothen et al. 2013). The transposition of this method to SSS have however not yelded convincing criteria (see Ward, Roten). Instead of colour picker, SSS are asked to position a set of inducer on their idiosyncratic concurrent location on the screen. If each inducer is repeated several times we can then compute the area between the responses for each inducer (i.e. a triangle if repeated three times). The sum or grand average of the triangle areas across several inducers of several conditions (i.e. number, weekdays and months) is then used to estimate individual consistencies. The smallest the total, the more consistent individual responses area. Despite yealding to satisfactory results, it leads to several limits: participant can give a response in the same position of the screen and obtain excellent consistency scores.

In the following we aimed at taking advantage of two property of synthetic responses: they give rise to a form (i.e. number form, see Galton) that follows a sequential order (or ordinality ).

We harnessed a geographical package [ADD REF] to extract geometrical features from participant responses. For example we can extract polygons from each conditions and compute the area of these polygons.

0. Load data:

In the following I upload and merge the data from Ward, Rothen and Van Peters. Data is stored into a full dataset ds (i.e. 1 row per trial) and a dataset per participant ds_Quest (i.e. 1 row per participant).

## New names:
## New names:
## • `` -> `...36`
## • `` -> `...37`
## [1] 0

Now we can enrich our dataset and process several checks

## Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
## ℹ Please use one dimensional logical vectors instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

We exclude 2 participants for which we could not compute the z scores, hence invalid responses.

Manually ajust come inconsistent screen sizes:

## [1] NA

Replicated Features

We initialize an empty dataframe to collect ROC specifications of each features:

1. Consistency:

Definition: Calculating consistency Each stimulus is represented by three xy coordinates - (x1, y1), (x2, y2), (x3, y3) - from the three repetitions. For each stimulus, the area of the triangle bounded by the coordinates is calculated as follows:
\(Area = (x1y2 + x2y3 + x3y1 – x1y3 – x2y1 – x3y2) / 2\)

The mean area is calculated by adding together the area for each stimulus and dividing by 29. This unit is transformed into a percentage area taking into account the different pixel resolution of each participant.
Mean area = \((Summed area / 29) * 100 / ScreenArea\), where: \(ScreenArea = Xpixels * Ypixels\)

1.1. Example

1.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases

threshold sensitivity specificity ppv npv
threshold 0.0765602 76.43098 65.31365 70.71651 71.65992
Ctl Syn
Ctl 94 (34.7%) 177 (65.3%)
Syn 227 (76.4%) 70 (23.6%)

2. Permuted consistency

Replicate Rothen methods. Might take some time to compute.

“Calculating chance levels of consistency To create permuted datasets for each participant: the 87 xy coordinates are randomly shuffled so they are no longer linked to the original data labels (“Monday”, “5”, “April”, etc.). The mean area of the triangles based on the shuffled coordinates is computed (as described above), and the whole process is repeated 1000 times to obtain a subject-specific distribution of chance levels of consistency. A z-score is calculated comparing the observed consistency against the mean and SD of the permuted data: \(Z = [(observed consistency) – (mean consistency of permuted data)] / (SD of permuted data)\)

Code retrieved from OSF (adapted here):

2.1. Example

2.2. ROC

3. SD as in Ward

As in Ward:

Specifically, the standard deviation of the x-coordinates and/or the standard deviation of the y-coordinates (measured across all trials) should exceed a proposed value of 0.075 for a normalized screen with width and height of 1 unit.

A participant who produced a horizontal straight-line form would have a very low standard deviation in the y-coordinates but a high standard deviation in x-coordinates, and a participant with a vertical line would have the reverse profile. A participant with a circular spatial form would be high on both. A participant who clicks randomly around the screen would also be high on both x and y standard deviation, but would fail the consistency tests (the triangles would be large).

Hence the SD is used in combination with consistency.

3.1. Example

Would need an example with all in the center

3.2. ROC

WORK ON THIS
I need to review the paper to try to replicate the ROC (hence with Ward’s dataset).
Then generalize it to full data set.
Also need to figure out how to integrate the three different criterias.

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 147.8816 92.25589 41.69742 63.42593 83.08824
Ctl Syn
Ctl 113 (41.7%) 158 (58.3%)
Syn 23 (7.7%) 274 (92.3%)
## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 91.6897 81.81818 59.7786 69.03409 75
Ctl Syn
Ctl 162 (59.8%) 109 (40.2%)
Syn 54 (18.2%) 243 (81.8%)

New form based features

These new measures aim to take advantage of several properties: - ordinality - synesthetic forms Hence we aim to take advantage of some geometrical features of the synesthetic forms. For example we can define segments across the ordered stimuli (i.e. from 1 to 9, monday to sunday and january to december).

4. Segment self-intersection

An idea I have is to look into the lines and order of the forms. I would exclude when lines crosses. (since we expect forms the lines crossing means no form is formed). Needs refinement.

I think that the number of stimuli per condition should be taken into account (i.e. 9 numbers, 7 days, 12 months). Hence would need to be divided by this number of stimulus.

In each condition the connected x and y generates a segment, hence the number of segment is length(stimuli)-1. Moreover, currently, each stimuli is connected by 3 segment, one for each (of the 3) repetition. So dividing by 3, we have the average number of segment corssings per condition. Next we sum these for each ID Ideally we should compute the number of crossings across the repetitions, in addition to make it more complex it would also be computationally more demanding, and I don’t beleive it would lead to a significant difference.

IMPORTANT: data frame needs to be informed of stimulus order to make sense!

The question is, should I sum the features across Conditions or average them? I know some conditions contain responses at the exact same coordinates. Also the conditions don’t have the same number of stimuli, i.e.: - weeks: 7 - months: 12 - numbers: 10 Hence months are more likely to have self-intersections than weeks. But also some participant did not respond on specific conditions. How is that important?

4.1 Example

4.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases

threshold sensitivity specificity ppv npv
threshold 1.270115 76.76768 62.36162 69.09091 71.0084
Ctl Syn
Ctl 102 (37.6%) 169 (62.4%)
Syn 228 (76.8%) 69 (23.2%)

Segments (with sf)

Analyzing each repetition separately might favour horizontal positioning based on LTR order. For example, using the strategy if the number 0 is always positioned in the left, and 9 on the right (see MNL), there might be no intersections, though no Synesthesia. However it is more unlikely that this would work across repetitions (i.e. having the same vertical position). So I need to add a criteria of the number of intersections across repetitions. This would however only work if I exclude the end to 1st between each repetition.

With 3 repetitions we have: - 1 vs 2 - 2 vs 3 - 3 vs 1

We will take advantage of the sf package.

## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
## Spherical geometry (s2) switched off

##                 X           Y L1
##  [1,] -0.33939732  0.03257203  1
##  [2,] -0.06443808 -1.45129143  1
##  [3,]  0.80626616 -0.54448598  1
##  [4,]  1.05067436 -0.88453802  1
##  [5,]  0.95902129 -1.47190064  1
##  [6,]  0.60768449  1.05272817  1
##  [7,] -0.27829526  1.00120513  1
##  [8,] -1.39340771  0.64054387  1
##  [9,] -2.20300990  0.54780240  1
## [10,] -1.92805067 -0.20443393  1
## [1] 30
## [1] 63
## [1] 168

5. Between repetitions:

5.1. Example

TO ADD

5.2. ROC

6. Segment length (should replicate Roten)

6.1. Example

6.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases

threshold sensitivity specificity ppv npv
threshold 7.856126 83.83838 49.07749 64.34109 73.48066
Ctl Syn
Ctl 138 (50.9%) 133 (49.1%)
Syn 249 (83.8%) 48 (16.2%)

7. Distances between repetitions

7.1. Example

7.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
-Inf 100 0 52.28873 NaN
Inf 0 100 NaN 47.71127
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 3 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 4 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 5 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 6 has 2 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, 7, , value = list("BtwDist", 41.89, :
## replacement element 7 has 2 rows to replace 1 rows

Polygon based geometries

8. Polygon area

8.1. Example

8.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 1.288086 59.25926 70.4797 68.75 61.21795
Ctl Syn
Ctl 191 (70.5%) 80 (29.5%)
Syn 121 (40.7%) 176 (59.3%)

9. Polygon simplicity

9.1. Example

9.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 0.1666667 74.41077 56.45756 65.19174 66.81223
Ctl Syn
Ctl 153 (56.5%) 118 (43.5%)
Syn 76 (25.6%) 221 (74.4%)

10. Topological validity Structure

is topologically valid:

From the package description: “For projected geometries, st_make_valid uses the lwgeom_makevalid method also used by the PostGIS command ST_makevalid if the GEOS version linked to is smaller than 3.8.0, and otherwise the version shipped in GEOS; for geometries having ellipsoidal coordinates s2::s2_rebuild is being used.” From https://postgis.net/docs/ST_IsValid.html: value is well-formed and valid in 2D according to the OGC rules. (Open Geopsatial Consotrtium)

10.1. Example

10.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 1.5 70.70707 75.27675 75.81227 70.10309
Ctl Syn
Ctl 204 (75.3%) 67 (24.7%)
Syn 87 (29.3%) 210 (70.7%)

11. Topological DE-9IM

See: https://r-spatial.org/book/03-Geometries.html#sec-opgeom See: https://en.wikipedia.org/wiki/DE-9IM

DE-9IM is a standard for several topological model’s features. It is called by st_relate. It returns a 3 x 3 matrix (DE9IM) for each relations:

\({\displaystyle \operatorname {DE9IM} (a,b)={\begin{bmatrix}\dim(I(a)\cap I(b))&\dim(I(a)\cap B(b))&\dim(I(a)\cap E(b))\\\dim(B(a)\cap I(b))&\dim(B(a)\cap B(b))&\dim(B(a)\cap E(b))\\\dim(E(a)\cap I(b))&\dim(E(a)\cap B(b))&\dim(E(a)\cap E(b))\end{bmatrix}}}\)

⁠dim$ {}$⁠ is the dimension of the intersection (∩) of the interior (I), boundary (B), and exterior (E) of geometries a and b.

Hence it returns a spatial predicate wdefined with mas domains:

##           Ctl  Syn       Subs
## 212101212 952 1568  0.6071429
## 2121012F2   2    2  1.0000000
## 21210F212   0    1  0.0000000
## 212111212   4    8  0.5000000
## 212F01212  18    8  2.2500000
## 212F01FF2  15    2  7.5000000
## 212F0F212  12    2  6.0000000
## 212F0FFF2   5    1  5.0000000
## 212F1FFF2   1    0        Inf
## 2F2101212  13    1 13.0000000
## 2F21012F2  10    4  2.5000000
## 2F2F01212  16    2  8.0000000
## 2F2F01FF2  41    4 10.2500000
## 2FF10F212  12    1 12.0000000
## 2FF10F2F2   3    0        Inf
## 2FF11F212   1    0        Inf
## 2FFF0F212  32    3 10.6666667
## 2FFF0FFF2 138   19  7.2631579
## 2FFF1FFF2 626  853  0.7338804
## FF2F01212   6    0        Inf
## FF2F01FF2  27    6  4.5000000
## FF2F11212   1    0        Inf
## FF2FF1212 294  124  2.3709677
## FFFF0F212  26    2 13.0000000
## FFFF0FFF2 184   62  2.9677419
##            Ctl  Syn       Subs
## 212101212 1009 1629  0.6193984
## 2121012F2    0    1  0.0000000
## 21210F212    2    3  0.6666667
## 21210F2F2    1    0        Inf
## 212111212   12    9  1.3333333
## 212F01212   12    2  6.0000000
## 212F01FF2   11    0        Inf
## 212F0F212   12    2  6.0000000
## 212F0FFF2    5    1  5.0000000
## 212F11212    1    0        Inf
## 212F11FF2    1    0        Inf
## 2F2101212   13    7  1.8571429
## 2F21012F2   27    6  4.5000000
## 2F2111212    1    0        Inf
## 2F2F01212   20    1 20.0000000
## 2F2F01FF2   44   10  4.4000000
## 2FF10F212   21    2 10.5000000
## 2FF10F2F2    8    1  8.0000000
## 2FF11F2F2    1    0        Inf
## 2FFF0F212   59    6  9.8333333
## 2FFF0FFF2  173   31  5.5806452
## 2FFF1FFF2  591  846  0.6985816
## FF2F01212    4    2  2.0000000
## FF2F01FF2   40    4 10.0000000
## FF2F11212    3    0        Inf
## FF2FF1212   99   39  2.5384615
## FFFF0F212   31    8  3.8750000
## FFFF0FFF2  238   63  3.7777778
##           Ctl  Syn       Subs
## 212101212 939 1569  0.5984704
## 2121012F2   0    2  0.0000000
## 21210F212   0    1  0.0000000
## 21210F2F2   1    0        Inf
## 212111212   8    7  1.1428571
## 212F01212  12    3  4.0000000
## 212F01FF2  14    1 14.0000000
## 212F0F212  23    7  3.2857143
## 212F0FFF2   5    0        Inf
## 212F11212   1    0        Inf
## 2F2101212  18    5  3.6000000
## 2F21012F2  11    1 11.0000000
## 2F2111212   1    0        Inf
## 2F2F01212  12    1 12.0000000
## 2F2F01FF2  44    3 14.6666667
## 2FF10F212   7    0        Inf
## 2FF10F2F2   4    1  4.0000000
## 2FFF0F212  39    8  4.8750000
## 2FFF0FFF2 147   27  5.4444444
## 2FFF1FFF2 611  853  0.7162954
## FF2F01212   6    2  3.0000000
## FF2F01FF2  30    4  7.5000000
## FF2F11212   2    0        Inf
## FF2FF1212 279  125  2.2320000
## FFFF0F212  40    4 10.0000000
## FFFF0FFF2 185   49  3.7755102
##             Ctl  Syn      Subs
## 2121012122  939 1569 0.5984704
## 212101212   952 1568 0.6071429
## 2121012121 1009 1629 0.6193984
## 2FFF1FFF21  591  846 0.6985816
## 2FFF1FFF22  611  853 0.7162954
## 2FFF1FFF2   626  853 0.7338804
## FF2FF12122  279  125 2.2320000
## FF2FF1212   294  124 2.3709677

2FFF1FFF2: S1 Interior vs. S2 Interior: The interiors intersect in 2 dimensions (2). S1 Interior vs. S2 Boundary: No intersection (F). S1 Interior vs. S2 Exterior: No intersection (F). S1 Boundary vs. S2 Interior: No intersection (F). S1 Boundary vs. S2 Boundary: A 1-dimensional intersection occurs (e.g., they share a common line segment) (1). S1 Boundary vs. S2 Exterior: No intersection (F). S1 Exterior vs. S2 Interior: No intersection (F). S1 Exterior vs. S2 Boundary: No intersection (F). S1 Exterior vs. S2 Exterior: The exteriors intersect in 2 dimensions (2).

2FFF0FFF2: 2: The intersection of the first geometry’s interior and the second geometry’s interior creates a polygon (a two-dimensional intersection). F: The interior of the first geometry does not intersect the boundary of the second. F: The interior of the first geometry does not intersect the exterior of the second. F: The boundary of the first geometry does not intersect the interior of the second. 0: The boundary of the first geometry intersects the boundary of the second geometry at a point (a zero-dimensional intersection). F: The boundary of the first geometry does not intersect the exterior of the second. F: The exterior of the first geometry does not intersect the interior of the second. F: The exterior of the first geometry does not intersect the boundary of the second. 2: The exterior of the first geometry intersects the exterior of the second geometry, creating a polygon (a two-dimensional intersection).

FFFF0FFF2: F (False): The intersection of the interior of the first geometry with the interior of the second geometry is empty. F (False): The intersection of the interior of the first geometry with the boundary of the second geometry is empty. F (False): The intersection of the interior of the first geometry with the exterior of the second geometry is empty. F (False): The intersection of the boundary of the first geometry with the interior of the second geometry is empty. 0 (Zero-Dimensional): The intersection of the boundary of the first geometry with the boundary of the second geometry is a point (0-dimensional). F (False): The intersection of the boundary of the first geometry with the exterior of the second geometry is empty. F (False): The intersection of the exterior of the first geometry with the interior of the second geometry is empty. F (False): The intersection of the exterior of the first geometry with the boundary of the second geometry is empty. 2 (Two-Dimensional): The intersection of the exterior of the first geometry with the exterior of the second geometry is a 2-dimensional area.

11.1 Example

11.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 17078.5 93.60269 23.24723 57.20165 76.82927

12. is clockwise (?)

12.1. Example

12.2. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 2.5 75.75758 41.32841 58.59375 60.86957
Ctl Syn
Ctl 112 (41.3%) 159 (58.7%)
Syn 72 (24.2%) 225 (75.8%)

Convex hull

13. Convex Hull Area

13.1 Example

13. ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

threshold sensitivity specificity ppv npv
threshold 1.491143 74.74747 47.97048 61.15702 63.41463
Ctl Syn
Ctl 130 (48%) 141 (52%)
Syn 75 (25.3%) 222 (74.7%)

Others

14. PCA Ansiotropy

13.1. Example

14.2 ROC

Angles

15. Angles

## Warning in st_cast.sf(ds_segm, "POINT"): repeating attributes for all
## sub-geometries for which they may not be constant
## Warning: Removed 16104 rows containing non-finite outside the scale range
## (`stat_density()`).

## [1] 444

15.1 Example

15.2 ROC

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

## 
## Call:
## roc.formula(formula = group ~ quadrant, data = ds[!is.nan(ds$quadrant),     ], percent = TRUE, ci = TRUE, boot.n = 100, ci.alpha = 0.9,     stratified = FALSE, plot = TRUE, auc.polygon = TRUE, max.auc.polygon = TRUE,     grid = TRUE, print.auc = TRUE, show.thres = TRUE)
## 
## Data: quadrant in 15621 controls (group Ctl) < 21563 cases (group Syn).
## Area under the curve: 52.87%
## 95% CI: 52.31%-53.44% (DeLong)

Compare all features:

Summary table:

Feature AUC threshold sensitivity specificity ppv npv high_ci low_ci
10 isValidPoly 77.7473 1.500000e+00 70.70707 75.27675 75.81227 70.10309 74.01116 77.74734
4 LineInter 72.8354 1.270115e+00 76.76768 62.36162 69.09091 71.00840 68.60864 72.83536
1 Consistency_zs 70.3685 7.656020e-02 76.43098 65.31365 70.71651 71.65992 65.81407 70.36851
8 areaPoly 69.8958 1.288086e+00 59.25926 70.47970 68.75000 61.21795 65.64189 69.89576
3 SD_ID_y 69.6088 9.168970e+01 81.81818 59.77860 69.03409 75.00000 65.10650 69.60876
9 isSimplePoly 69.3572 1.666667e-01 74.41077 56.45756 65.19174 66.81223 65.06818 69.35716
2 SD_ID_x 64.3321 1.478816e+02 92.25589 41.69742 63.42593 83.08824 59.65966 64.33213
6 Segm_leng 63.7755 7.856126e+00 83.83838 49.07749 64.34109 73.48066 59.07019 63.77552
12 isClockwise 61.4279 2.500000e+00 75.75758 41.32841 58.59375 60.86957 56.87297 61.42793
13 areaVhull 56.0389 1.491143e+00 74.74747 47.97048 61.15702 63.41463 51.11374 56.03886
11 relateReciepe 55.6077 1.707850e+04 93.60269 23.24723 57.20165 76.82927 51.33587 55.60774
7 BtwDist 41.89 -Inf 100.00000 0.00000 52.28873 NaN 37.66036 41.88999
5 NA NA NA NA NA NA NA NA NA

Could compare each feature singularly:

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls < cases

## Setting levels: control = Ctl, case = Syn
## Setting direction: controls > cases

## 
##  Bootstrap test for two correlated ROC curves
## 
## data:  Sum_isValidStruct and Consistency_zs in ds_Q by group (Ctl, Syn)
## D = 5.9995, boot.n = 100, boot.stratified = 1, p-value = 1.979e-09
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## pAUC (100-90 specificity) of roc1 pAUC (100-90 specificity) of roc2 
##                        3.05545340                        0.06696734

Or.. Use machine learning

Example:

With data V3 re add CV: WORKS!!

Following https://www.tidymodels.org/start/recipes/

## ── Attaching packages ────────────────────────────────────── tidymodels 1.4.1 ──
## ✔ broom        1.0.10     ✔ rsample      1.3.1 
## ✔ dials        1.4.2      ✔ tailor       0.1.0 
## ✔ infer        1.0.9      ✔ tune         2.0.0 
## ✔ modeldata    1.5.1      ✔ workflows    1.3.0 
## ✔ parsnip      1.3.3      ✔ workflowsets 1.1.1 
## ✔ purrr        1.1.0      ✔ yardstick    1.3.2 
## ✔ recipes      1.3.1
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ infer::conf_int() masks papaja::conf_int()
## ✖ purrr::discard()  masks scales::discard()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step()   masks stats::step()
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loaded glmnet 4.1-10
## 
## Attaching package: 'vip'
## The following object is masked from 'package:utils':
## 
##     vi
## # A tibble: 75 × 8
##         penalty mixture .metric     .estimator  mean     n std_err .config      
##           <dbl>   <dbl> <chr>       <chr>      <dbl> <int>   <dbl> <chr>        
##  1 0.0000000001    0    accuracy    binary     0.725     5 0.0285  pre0_mod01_p…
##  2 0.0000000001    0    brier_class binary     0.178     5 0.00880 pre0_mod01_p…
##  3 0.0000000001    0    roc_auc     binary     0.809     5 0.0189  pre0_mod01_p…
##  4 0.0000000001    0.25 accuracy    binary     0.732     5 0.0320  pre0_mod02_p…
##  5 0.0000000001    0.25 brier_class binary     0.181     5 0.0103  pre0_mod02_p…
##  6 0.0000000001    0.25 roc_auc     binary     0.803     5 0.0209  pre0_mod02_p…
##  7 0.0000000001    0.5  accuracy    binary     0.732     5 0.0320  pre0_mod03_p…
##  8 0.0000000001    0.5  brier_class binary     0.181     5 0.0104  pre0_mod03_p…
##  9 0.0000000001    0.5  roc_auc     binary     0.803     5 0.0209  pre0_mod03_p…
## 10 0.0000000001    0.75 accuracy    binary     0.736     5 0.0312  pre0_mod04_p…
## # ℹ 65 more rows
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## # A tibble: 3 × 4
##   .metric     .estimator .estimate .config        
##   <chr>       <chr>          <dbl> <chr>          
## 1 accuracy    binary         0.736 pre0_mod0_post0
## 2 roc_auc     binary         0.813 pre0_mod0_post0
## 3 brier_class binary         0.179 pre0_mod0_post0

## ══ Workflow [trained] ══════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: logistic_reg()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 1 Recipe Step
## 
## • step_normalize()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## 
## Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "binomial",      alpha = ~0.75) 
## 
##    Df  %Dev   Lambda
## 1   0  0.00 0.305200
## 2   1  2.07 0.278100
## 3   1  3.92 0.253400
## 4   1  5.56 0.230800
## 5   2  7.28 0.210300
## 6   2  9.00 0.191700
## 7   2 10.52 0.174600
## 8   3 11.94 0.159100
## 9   3 13.27 0.145000
## 10  3 14.44 0.132100
## 11  3 15.45 0.120400
## 12  3 16.33 0.109700
## 13  4 17.22 0.099930
## 14  4 18.04 0.091050
## 15  4 18.75 0.082960
## 16  4 19.37 0.075590
## 17  4 19.90 0.068880
## 18  5 20.41 0.062760
## 19  5 20.96 0.057180
## 20  6 21.46 0.052100
## 21  8 21.91 0.047470
## 22  8 22.32 0.043260
## 23  8 22.68 0.039410
## 24  8 22.98 0.035910
## 25  8 23.25 0.032720
## 26  9 23.48 0.029820
## 27  9 23.70 0.027170
## 28 10 23.89 0.024750
## 29 10 24.07 0.022550
## 30 10 24.22 0.020550
## 31 11 24.35 0.018720
## 32 11 24.54 0.017060
## 33 11 24.72 0.015550
## 34 11 24.88 0.014160
## 35 11 25.02 0.012910
## 36 11 25.14 0.011760
## 37 11 25.25 0.010720
## 38 11 25.34 0.009763
## 39 11 25.42 0.008896
## 40 11 25.49 0.008106
## 41 11 25.55 0.007385
## 42 11 25.60 0.006729
## 43 11 25.65 0.006132
## 44 11 25.69 0.005587
## 45 11 25.72 0.005091
## 46 11 25.75 0.004638
## 
## ...
## and 22 more lines.

## # A tibble: 0 × 5
## # ℹ 5 variables: term <chr>, step <dbl>, estimate <dbl>, lambda <dbl>,
## #   dev.ratio <dbl>

Discussion

From the different features we extracted, topological validity across the repetitions appeared to be the one leading to the largest Area Under the Curve. The optimal cutoff was exactly 1.5, leading to a sensitivity () and specificity ().

The optimal criterion ineeds to be informed about the order between inducers (i.e. to construct the polygons) and interestingly suggests that synthetic inducer are structurally mapped following topological rules analogous to geographical space structures. Hence suggesting a spatial nature for the synthetic forms of space sequence synesthetes.

References

Galton, Francis. 1880. “Visualised Numerals.” Nature 21 (533): 252–56. https://doi.org/10.1038/021252a0.
Rothen, Nicolas, Kristin Jünemann, Andy D. Mealor, Vera Burckhardt, and Jamie Ward. 2016. “The Sensitivity and Specificity of a Diagnostic Test of Sequence-Space Synesthesia.” Behavior Research Methods 48 (4): 1476–81. https://doi.org/10.3758/s13428-015-0656-2.
Rothen, Nicolas, Anil K. Seth, Christoph Witzel, and Jamie Ward. 2013. “Diagnosing Synaesthesia with Online Colour Pickers: Maximising Sensitivity and Specificity.” Journal of Neuroscience Methods 215 (1): 156–60. https://doi.org/10.1016/j.jneumeth.2013.02.009.
Van Petersen, Eline, Mareike Altgassen, Rob Van Lier, and Tessa M. Van Leeuwen. 2020a. “Enhanced Spatial Navigation Skills in Sequence-Space Synesthetes.” Cortex 130 (September): 49–63. https://doi.org/10.1016/j.cortex.2020.04.034.
———. 2020b. “Enhanced Spatial Navigation Skills in Sequence-Space Synesthetes.” Cortex 130 (September): 49–63. https://doi.org/10.1016/j.cortex.2020.04.034.
Ward, Jamie. n.d.a. “Optimizing a Measure of Consistency for Sequence-Space Synaesthesia.” https://doi.org/10.31234/osf.io/5cnr7.
———. n.d.b. “Optimizing a Measure of Consistency for Sequence-Space Synaesthesia.” https://doi.org/10.31234/osf.io/5cnr7.